Objective: to generate the basic knowledge for data analysis and interpretation using the R Project for Statistical Computing
Contents:
Contents:
The use of a programming language is useful when:
R is a language and environment for statistical computing and graphics.
R is an open source software and can be downloaded for Linux, OS X (Mac) or Windows
A Graphical User Interface allows the users to interact with electronic devices through graphical icons instead of text-based user interfaces.
RStudio is an integrated development environment for R, which is available in open source and commercial editions and runs on Windows, Mac, and Linux.
An R script is simply a text file containing (almost) the same commands that would be entered on the command line of R. It allows the user to compute the same set of actions continuously.
Packages are collections of R functions, data, and complex code in a well-defined format.
As an example, we can install the terra package:
The installed packages must be loaded in order to use their functions. For this purpose, the function library is used
To remove an existing package:
To visualise the installed packages and the packages in use:
R is an interpreted language
The result of each command can be:
R is an expression language and is case sensitive; therefore x and X are different objects that refer to different variables
The use of comments is very useful to know precisely what is happening in the script. The comments should always begin with the symbol #
To recall and execute previous commands, you can use the vertical arrow keys on the keyboard. This helps to apply or correct previous executed functions.
The data can be either visualised on the screen:
Stored in an object:
Or visualised on the screen by a plot:
There are many ways of assigning a value to a variable. The most common is the <- operator, which is composed by the less than character (<) and the minus character (-).
The = operator can be used as well.
Additionally, the assign function can be also used.
A numeric vector is a ordered collection of numbers. To set up a vector is necessary to use the c( ) function.
Similarly, we can create vectors that contain characters:
Sometimes, we need to create long vectors that follow a know sequence. For example:
For this purpose we can use the : symbol, which will create a sequence of consecutive integers.
Other option is to use the seq function, which will create a sequence according to the desired step.
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, ...)For example:
Objects are characterised by their name, content, and attribute. They represent the data stored in them. The mode function identifies the kind of elements in the object. Each object has only one mode. There are 4 principal modes (atomic modes):
The command length indicates the amount of elements in an object.
Which mode and length correspond to these elements?
| Object | Mode | Length |
|---|---|---|
| 58i | ||
| Cadmium | ||
| 1:10 | ||
| TRUE |
The command length indicates the amount of elements in an object.
Which mode and length correspond to these elements?
| Object | Mode | Length |
|---|---|---|
| 58i | complex | 1 |
| Cadmium | character | 1 |
| 1:10 | numeric | 10 |
| TRUE | logical | 1 |
All objects in R have a class, which can be retrieved with the class function. In the case of simple objects, the class is the same as the mode (e.g., “numeric”, “character”, and “logical”). For more complex object, the class will differ (e.g., “list”, “array”, “matrix”, and “data.frame”).
In some cases, there will be missing data in a set of values. These missing values are represented by the NA string. Please note that this is not a character.
Inf represents a non-finite numeric value.
Not a Number (NaN) represents values that should be numeric but they are not.
abs: absolute value of the object.
^n: raise to the power of n.
sqrt: square root of the object.
log: natural logarithm.
exp: antilogarithm.
rep: creates a vector of n times the object.
basename: returns the part of the file path after the last separator (/).
data.path <- "C:/Users/Documents/WS_Sam_Neua/WS1_proj.tif"
basename(data.path)
## [1] "WS1_proj.tif"dirname: returns the part of the file path before the last separator (/).
paste: used to paste multiple text strings together as one.
paste0: used to paste multiple text strings together as one without spaces.
file.path: Similar to paste and paste0, but will add forward slashes between each input.
substr: used to extract (or replace) substrings in a character vector.
For this second example, we could combine the basename function as follows:
To list files inside a folder, the function list.files can be used. Please copy the path from a folder on your computer.
list.files(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)For Example:
## [1] "Italy.csv" "Spain.csv"
## [1] "../Data/L2_Introduction_II_Data/Cities/Italy.csv"
## [2] "../Data/L2_Introduction_II_Data/Cities/Spain.csv"
Sys.time: Prints the time and date. This can be useful when running long scripts, so that you can work out how long a step or an entire script has taken to execute.
t: transposes a matrix.
example <- matrix(c(1:20), nrow = 5)
print(example)
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
example.trans <- t(example)
print(example.trans)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20cumsum: gives the cumulative sum from a vector.
log10: logarithm base 10.
The function help.start will launch a Web browser that allows the help pages to be browsed with hyperlinks.
You don’t need to memorise all information to include for each function. You can search the R documentation for a specific term with the question mark (?) before the function name.
Additionally, the double question marks (??) search the R help files for a word or phrase.
Two vectors can be added, subtracted, multiplied and divided:
range: gives the minimum and maximum values from a vector.
The functions max, min, mean, and var give the maximum, minimum, mean and variance values from a vector, respectively.
sum: gives the sum of any vector.
prod: gives the product of any vector.
sort: sorts the values increasing (by default) or decreasing (decreasing = TRUE).
y <- c(5, 3, 6, 8, 3, -1)
sort(y)
## [1] -1 3 3 5 6 8
sort(y, decreasing = TRUE)
## [1] 8 6 5 3 3 -1Please sort the following vector in an increasing and decreasing manner:
5, 9, 12, -10, -5, 3, 900
order: this works similar to the sort function, but rather that returning the reordered values themselves, it returns the positions of these values.
When can we use order instead of sort?
The squared brackets ([]) can be used to subset elements from a vector.
To extract everything but one or more selected element or elements, we can use the minus symbol inside the square brackets:
From the same vector please extract:
To show all the unique values in a vector, we can use the unique function.
In order to find the position of a specific object (e.g., numerical values, character values) in a vector, the which function is used. - If the requested object does not exist in the vector, the output vector has no values inside it
To know the position of values that are not equal to a specific object, we can use the which function together with the logical operator not equal to “!=”.
Note: when you want to search for multiple values within an object, it is better to use the %in% function (see Lecture 4).
A matrix is a bidimensional array. It can be created with the matrix function.
With the rnorm function, we can create random values with a normal distribution. We can use this function to create a matrix of 60 random values:
# Matrix with 12 rows and 5 columns and random values
x <- matrix(rnorm(60), nrow = 12)
print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488Extracting values from a matrix is similar to extracting values from a vector; however, a matrix has two dimensions [row number, column number].
print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488
x[9, 3]
## [1] 0.7929131print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488
x[, 1]
## [1] 1.34185154 0.27001872 0.35729918 2.02984054 -0.29178567 -1.33367146
## [7] -1.07762081 -0.68245981 -0.44350603 0.48617600 2.10351000 -0.01478887print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488
x[3, ]
## [1] 0.3572992 -1.0231209 1.5149999 0.4274274 -0.4395550print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488
x[c(1, 2), 4]
## [1] -0.7618614 0.1004831print(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.34185154 0.20359985 -0.005034602 -0.76186135 -1.0489687
## [2,] 0.27001872 1.64266804 1.361292452 0.10048312 -0.8730561
## [3,] 0.35729918 -1.02312092 1.514999869 0.42742738 -0.4395550
## [4,] 2.02984054 0.40941469 1.078059341 1.12477940 0.3196170
## [5,] -0.29178567 -0.31932897 -1.233155710 -0.48092053 0.3287599
## [6,] -1.33367146 -0.03555508 0.008222698 -0.33403233 -0.1328199
## [7,] -1.07762081 -1.01259814 0.804670049 0.53818671 -1.8376158
## [8,] -0.68245981 -0.92572637 0.487013911 -0.13250592 0.1971838
## [9,] -0.44350603 0.82192561 0.792913088 0.65469014 -0.9422564
## [10,] 0.48617600 -0.05174419 -0.700349933 -0.05280361 0.4678160
## [11,] 2.10351000 0.42367104 0.742110651 0.48480483 0.2923420
## [12,] -0.01478887 0.47969944 -0.045532817 -1.63538774 1.8060488
x[c(3, 4), c(1, 2)]
## [,1] [,2]
## [1,] 0.3572992 -1.0231209
## [2,] 2.0298405 0.4094147A list is an ordered collection of objects known as components, which can be from different classes.
The elements of a list are always enumerated. To access to each level of a list:
Also, they can be accessed using their respective position:
Now that we have the numbers for the dinos.eaten, how can we extract the 12?
A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
To construct a data frame, the function data.frame is used. - As an example, we have the following information:
| River | Length(km) | Discharge (m\(^3\)/s) |
|---|---|---|
| Congo | 4370 | 41200 |
| Mekong | 4023 | 16000 |
| Rhine | 1233 | 2900 |
Using the $ symbol, we can extract data from a certain column.
The $ sign can also be used to create a new data column. Here, we will convert discharge from m 3 /s into BMC/year and store it in a new data column.